Model: EfficientNet-B4 with custom classification head
Date: February 21, 2026
Dataset: 12,446 images (CT + Ultrasound) | Test set: 1,904 images
Status: โ
All clinical KPI targets exceeded
The ROC curve shows perfect discrimination (AUC = 1.0). The confusion matrix confirms zero missed stone cases across all 224 stone images in the test set.
Grad-CAM++ heatmaps show which image regions drove each prediction. Red/yellow areas = high model attention. These should correspond to kidney and urinary tract anatomy for valid predictions.
All 14 false positives are shown below with Grad-CAM overlays. These are no_stone images incorrectly predicted as stone. Common causes: cysts mimicking stones, vascular calcifications, image compression artifacts.
Decision threshold was optimised using F2-score on the validation set. F2 weights recall (sensitivity) twice as heavily as precision, appropriate for a screening tool where missing a stone is more harmful than a false alarm.
| Metric | Value | Target | Status |
|---|---|---|---|
| Sensitivity (Recall) | 1.0000 | โฅ 0.92 | โ PASSED |
| Specificity | 0.9917 | โฅ 0.88 | โ PASSED |
| AUC-ROC | 1.0000 | โฅ 0.95 | โ PASSED |
| Precision | 0.9412 | โฅ 0.85 | โ PASSED |
| F2-Score | 0.9877 | โฅ 0.90 | โ PASSED |
| False Negatives | 0 | Minimise | โ ZERO |
| False Positives | 14 | < 5% of negatives | โ 0.83% |
Model is ready for API development and deployment. Phase 4 will wrap this model in a FastAPI REST endpoint, containerise with Docker, and serve predictions with Grad-CAM heatmaps via HTTP.
Generated automatically by scripts/generate_report.py ยท 2026-02-21 21:19 ยท Kidney Stone CNN v1.0